1 Introduction

  • Data visualization is an essential component of genomic data analysis. However, the size and diversity of the data sets produced by today’s sequencing and array-based profiling methods present major challenges to visualization tools.

  • The Integrative Genomics Viewer (IGV) is a high-performance viewer that efficiently handles large heterogeneous data sets, while providing a smooth and intuitive user experience at all levels of genome resolution.

  • A key characteristic of IGV is its focus on the integrative nature of genomic studies, with support for both array-based and next-generation sequencing data, and the integration of clinical and phenotypic data.

  • Although IGV is often used to view genomic data from public sources, its primary emphasis is to support researchers who wish to visualize and explore their own data sets or those from colleagues.

  • IGV supports flexible loading of local and remote data sets, and is optimized to provide high-performance data visualization and exploration on standard desktop systems.

  • IGV is freely available for download under a GNU LGPL open-source license.

3 Tutorial

In this tutorial we are going to learn how to use IGV to visualize genomic data. The first thing to do is create a directory to store all the tutorial data. It is good practice to create a new directory for each project you work on, this ensures files do not get mixed up and all the results are self-contained.

Create a ‘tutorial’ directory to store output files:

bash
mkdir tutorial

Download the tutorial and exercise data:

bash
curl https://raw.githubusercontent.com/zifornd/bioinformatics-workshop/main/data/visualization/data.tar.gz --output tutorial/data.tar.gz
##   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
##                                  Dload  Upload   Total   Spent    Left  Speed
## 
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0 13.8M    0  4149    0     0   3595      0  1:07:11  0:00:01  1:07:10  3610
  1 13.8M    1  214k    0     0   116k      0  0:02:01  0:00:01  0:02:00  116k
  2 13.8M    2  405k    0     0   142k      0  0:01:39  0:00:02  0:01:37  142k
  3 13.8M    3  544k    0     0   141k      0  0:01:40  0:00:03  0:01:37  141k
  4 13.8M    4  683k    0     0   139k      0  0:01:41  0:00:04  0:01:37  139k
  5 13.8M    5  804k    0     0   137k      0  0:01:42  0:00:05  0:01:37  170k
  6 13.8M    6  921k    0     0   134k      0  0:01:45  0:00:06  0:01:39  141k
  7 13.8M    7 1048k    0     0   133k      0  0:01:46  0:00:07  0:01:39  128k
  8 13.8M    8 1164k    0     0   131k      0  0:01:47  0:00:08  0:01:39  123k
  8 13.8M    8 1185k    0     0   119k      0  0:01:58  0:00:09  0:01:49   99k
  8 13.8M    8 1222k    0     0   112k      0  0:02:05  0:00:10  0:01:55 85274
  8 13.8M    8 1265k    0     0   106k      0  0:02:12  0:00:11  0:02:01 70440
  9 13.8M    9 1311k    0     0   101k      0  0:02:18  0:00:12  0:02:06 53700
  9 13.8M    9 1348k    0     0  99222      0  0:02:26  0:00:13  0:02:13 37318
 10 13.8M   10 1455k    0     0    97k      0  0:02:24  0:00:14  0:02:10 55814
 11 13.8M   11 1600k    0     0   100k      0  0:02:20  0:00:15  0:02:05 77541
 13 13.8M   13 1856k    0     0   110k      0  0:02:08  0:00:16  0:01:52  117k
 14 13.8M   14 2015k    0     0   112k      0  0:02:05  0:00:17  0:01:48  140k
 15 13.8M   15 2191k    0     0   116k      0  0:02:01  0:00:18  0:01:43  170k
 17 13.8M   17 2463k    0     0   124k      0  0:01:54  0:00:19  0:01:35  203k
 19 13.8M   19 2816k    0     0   135k      0  0:01:44  0:00:20  0:01:24  243k
 21 13.8M   21 3039k    0     0   139k      0  0:01:41  0:00:21  0:01:20  237k
 23 13.8M   23 3295k    0     0   144k      0  0:01:38  0:00:22  0:01:16  257k
 24 13.8M   24 3487k    0     0   146k      0  0:01:36  0:00:23  0:01:13  258k
 25 13.8M   25 3599k    0     0   144k      0  0:01:37  0:00:24  0:01:13  227k
 26 13.8M   26 3807k    0     0   147k      0  0:01:36  0:00:25  0:01:11  198k
 29 13.8M   29 4111k    0     0   153k      0  0:01:32  0:00:26  0:01:06  213k
 29 13.8M   29 4239k    0     0   152k      0  0:01:33  0:00:27  0:01:06  187k
 30 13.8M   30 4319k    0     0   149k      0  0:01:34  0:00:28  0:01:06  166k
 31 13.8M   31 4399k    0     0   147k      0  0:01:36  0:00:29  0:01:07  159k
 32 13.8M   32 4544k    0     0   147k      0  0:01:36  0:00:30  0:01:06  147k
 33 13.8M   33 4800k    0     0   150k      0  0:01:33  0:00:31  0:01:02  137k
 35 13.8M   35 5056k    0     0   153k      0  0:01:31  0:00:32  0:00:59  164k
 37 13.8M   37 5359k    0     0   158k      0  0:01:29  0:00:33  0:00:56  208k
 39 13.8M   39 5535k    0     0   158k      0  0:01:29  0:00:34  0:00:55  228k
 40 13.8M   40 5711k    0     0   159k      0  0:01:28  0:00:35  0:00:53  233k
 41 13.8M   41 5919k    0     0   160k      0  0:01:28  0:00:36  0:00:52  224k
 42 13.8M   42 6063k    0     0   159k      0  0:01:28  0:00:37  0:00:51  197k
 43 13.8M   43 6095k    0     0   156k      0  0:01:30  0:00:38  0:00:52  147k
 43 13.8M   43 6111k    0     0   153k      0  0:01:32  0:00:39  0:00:53  115k
 43 13.8M   43 6144k    0     0   150k      0  0:01:34  0:00:40  0:00:54 88422
 44 13.8M   44 6351k    0     0   151k      0  0:01:33  0:00:41  0:00:52 88369
 47 13.8M   47 6703k    0     0   156k      0  0:01:30  0:00:42  0:00:48  130k
 49 13.8M   49 7040k    0     0   160k      0  0:01:28  0:00:43  0:00:45  187k
 50 13.8M   50 7104k    0     0   158k      0  0:01:29  0:00:44  0:00:45  197k
 50 13.8M   50 7168k    0     0   156k      0  0:01:30  0:00:45  0:00:45  201k
 50 13.8M   50 7183k    0     0   153k      0  0:01:32  0:00:46  0:00:46  166k
 51 13.8M   51 7232k    0     0   151k      0  0:01:33  0:00:47  0:00:46  104k
 51 13.8M   51 7296k    0     0   149k      0  0:01:34  0:00:48  0:00:46 52734
 52 13.8M   52 7360k    0     0   146k      0  0:01:36  0:00:50  0:00:46 50334
 52 13.8M   52 7407k    0     0   145k      0  0:01:37  0:00:50  0:00:47 49633
 53 13.8M   53 7552k    0     0   145k      0  0:01:37  0:00:51  0:00:46 75473
 54 13.8M   54 7711k    0     0   145k      0  0:01:37  0:00:52  0:00:45 98517
 55 13.8M   55 7808k    0     0   144k      0  0:01:37  0:00:53  0:00:44  102k
 56 13.8M   56 8015k    0     0   146k      0  0:01:36  0:00:54  0:00:42  136k
 57 13.8M   57 8079k    0     0   144k      0  0:01:37  0:00:55  0:00:42  134k
 57 13.8M   57 8143k    0     0   143k      0  0:01:38  0:00:56  0:00:42  117k
 58 13.8M   58 8239k    0     0   142k      0  0:01:39  0:00:57  0:00:42  105k
 58 13.8M   58 8320k    0     0   141k      0  0:01:40  0:00:58  0:00:42  102k
 59 13.8M   59 8431k    0     0   140k      0  0:01:40  0:00:59  0:00:41 85535
 60 13.8M   60 8591k    0     0   141k      0  0:01:40  0:01:00  0:00:40  102k
 63 13.8M   63 8927k    0     0   144k      0  0:01:38  0:01:01  0:00:37  158k
 65 13.8M   65 9311k    0     0   148k      0  0:01:35  0:01:02  0:00:33  214k
 67 13.8M   67 9600k    0     0   150k      0  0:01:34  0:01:03  0:00:31  256k
 72 13.8M   72 10.0M    0     0   158k      0  0:01:29  0:01:04  0:00:25  371k
 77 13.8M   77 10.7M    0     0   166k      0  0:01:24  0:01:05  0:00:19  477k
 80 13.8M   80 11.0M    0     0   169k      0  0:01:23  0:01:06  0:00:17  476k
 80 13.8M   80 11.1M    0     0   168k      0  0:01:24  0:01:07  0:00:17  421k
 81 13.8M   81 11.2M    0     0   166k      0  0:01:24  0:01:08  0:00:16  374k
 81 13.8M   81 11.2M    0     0   165k      0  0:01:25  0:01:09  0:00:16  248k
 82 13.8M   82 11.3M    0     0   164k      0  0:01:25  0:01:10  0:00:15  137k
 83 13.8M   83 11.6M    0     0   165k      0  0:01:25  0:01:11  0:00:14  110k
 86 13.8M   86 12.0M    0     0   168k      0  0:01:23  0:01:12  0:00:11  171k
 89 13.8M   89 12.3M    0     0   170k      0  0:01:22  0:01:13  0:00:09  223k
 90 13.8M   90 12.5M    0     0   171k      0  0:01:22  0:01:14  0:00:08  261k
 92 13.8M   92 12.7M    0     0   171k      0  0:01:22  0:01:15  0:00:07  272k
 94 13.8M   94 13.0M    0     0   174k      0  0:01:21  0:01:16  0:00:05  304k
 98 13.8M   98 13.6M    0     0   179k      0  0:01:18  0:01:17  0:00:01  332k
100 13.8M  100 13.8M    0     0   181k      0  0:01:18  0:01:18 --:--:--  371k

Extract the archive file into the tutorial directory:

bash
tar xf tutorial/data.tar.gz --directory=tutorial

3.1 Install IGV

The software we are going to use in this tutorial can be installed using the conda package manager. Please refer to the previous conda workshop for details on installing software and creating conda environments.

Create a new environment with IGV installed:

bash
conda create --yes --name igvtools igv
## Collecting package metadata (current_repodata.json): ...working... done
## Solving environment: ...working... done
## 
## ## Package Plan ##
## 
##   environment location: /opt/miniconda3/envs/igvtools
## 
##   added / updated specs:
##     - igv
## 
## 
## The following NEW packages will be INSTALLED:
## 
##   igv                bioconda/noarch::igv-2.13.2-hdfd78af_0
##   libcxx             conda-forge/osx-64::libcxx-14.0.6-hccf4f1f_0
##   libzlib            conda-forge/osx-64::libzlib-1.2.12-hfd90126_3
##   openjdk            conda-forge/osx-64::openjdk-17.0.3-hbc0c0cd_2
## 
## 
## Preparing transaction: ...working... done
## Verifying transaction: ...working... done
## Executing transaction: ...working... done
## #
## # To activate this environment, use
## #
## #     $ conda activate igvtools
## #
## # To deactivate an active environment, use
## #
## #     $ conda deactivate
## 
## Retrieving notices: ...working... done

Activate the new environment to use it:

bash
conda activate igvtools

Test that the igv command is available:

bash
which igv
## /opt/miniconda3/envs/igvtools/bin/igv

3.2 IGV videos

The developers of IGV have produced a number of tutorial videos which describe the layout and functionality of the browser. Each video is roughly 5 minutes long and contains a lot of useful information. Instead of needlessly creating a new tutorial, we suggest you watch each of the videos instead.

3.2.1 Data navigation

This video demonstrates how to navigate the browser:

Output

3.2.2 Sequencing data

This video demonstrates how to load sequencing data:

Output

3.2.3 Genomic variation

This video demonstrates how SNPs and indels are displayed:

Output

3.2.4 RNA sequencing

This video demonstrates how RNA-seq data is displayed:

Output

3.2.5 Variation calling

This video demonstrates how variant calls are displayed:

Output

3.3 Batch commands

In some cases it is useful to control IGV programmatically, rather than interactively. This is handy when you want to perform lots of tasks in the session without having to manually load and navigate the browser. Tasks are issued using a batch script, a text file containing commands which the browser understands. The commands are run sequentially and appear on separate lines.

Below is an example of a batch script. The script starts by creating a new session, performing some tasks, and then exiting the session:

bash
cat tutorial/data/example/script.txt
## new
## genome hg19
## goto chr12:7,939,997-7,953,742
## snapshot tutorial/data/example/snapshot.png
## exit

The script is run by passing it to the IGV launcher with the batch parameter:

bash
igv --batch tutorial/data/example/script.txt
## Using system JDK.
## WARNING: package com.sun.java.swing.plaf.windows not in java.desktop
## WARNING: package sun.awt.windows not in java.desktop
## openjdk version "17.0.3" 2022-04-19 LTS
## OpenJDK Runtime Environment Zulu17.34+19-CA (build 17.0.3+7-LTS)
## OpenJDK 64-Bit Server VM Zulu17.34+19-CA (build 17.0.3+7-LTS, mixed mode, sharing)
## INFO [Sept 24,2022 09:55] [Main] Startup  IGV Version user not_set
## INFO [Sept 24,2022 09:55] [Main] Java 17.0.3 (build 17.0.3+7-LTS) 2022-04-19
## INFO [Sept 24,2022 09:55] [Main] Java Vendor: Azul Systems, Inc. http://www.azul.com/
## INFO [Sept 24,2022 09:55] [Main] JVM: OpenJDK 64-Bit Server VM Zulu17.34+19-CA   
## INFO [Sept 24,2022 09:55] [Main] OS: Mac OS X 12.6 x86_64
## INFO [Sept 24,2022 09:55] [Main] IGV Directory: /Users/James/igv
## SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
## SLF4J: Defaulting to no-operation (NOP) logger implementation
## SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
## INFO [Sept 24,2022 09:55] [AmazonUtils] AWS default credentials found. AWS support enabled.
## INFO [Sept 24,2022 09:55] [CommandListener] Listening on port 60151
## INFO [Sept 24,2022 09:55] [BatchRunner] Executing batch script: tutorial/data/example/script.txt
## INFO [Sept 24,2022 09:55] [GenomeManager] Loading genome: https://s3.amazonaws.com/igv.org.genomes/hg19/hg19.json
## INFO [Sept 24,2022 09:56] [TrackLoader] Loading resource:  https://s3.amazonaws.com/igv.org.genomes/hg19/ncbiRefSeq.sorted.txt.gz
## INFO [Sept 24,2022 09:56] [ShutdownThread] Shutting down

The browser outputs a picture of the Nanog locus in the human genome:

Output

This example is fairly simple, but more complicated tasks can be achieved with additional commands. The full list of batch commands is displayed below.

3.4 Reference sheet

Output
Command Parameters Description
collapse trackName Collapses a given track. trackName is optional, if it is not supplied all tracks are collapsed.
colorBy option tagName Sets the color by option for alignment tracks. For option TAG also specify the tag name.
echo Writes echo back to the response, primarily for testing port connections.
exit Exit (close) the IGV application.
expand trackName Expands a given trackName. trackName is optional, however, and if it is not supplied all tracks are expanded.
genome genomeIdOrPath Selects a genome by id, or loads a genome (or indexed fasta) from the supplied path.
goto locus or listOfLoci
group option tagName Alignment tracks only. Group alignments by the specified option. See below for valid option values. For option TAG also specify the tag name.
load file Loads a data or session file by specifying a full path to a local file or a URL. To explicitly specify a path to an index file use the optional index= parameter. For examples load foo.bam index=bar.bai
maxPanelHeight height Sets the number of vertical pixels (height) of each panel to include in image. Images created from a port command or batch script are not limited to the data visible on the screen. Stated another way, images can include the entire panel not just the portion visible in the scrollable screen area. The default value for this setting is 1000, increase it to see more data, decrease it to create smaller images. To capture the exact area visible on the screen set this value to -1.
new Create a new session. Unloads all tracks except the default genome annotations.
preference key value Temporarily set the preference named key to the specified value. This preference only lasts until IGV is shut down. The complete set of preference keys are listed in the file preferences.tab here. The first column is the key, the third column is the value type. For select value types the permitted values follow as a list delimited by the character |.
region chr start end Defines a region of interest bounded by the two loci (e.g., region chr1 100 200).
saveSession filename Save the current session. It is recommended that a full path be used for filename. IGV release 2.11.1
setAltColor colorString trackName Set the track altColor, used for negative values in a wig track or negative strand features. See description of setColor below. IGV release 2.11.1
setColor colorString trackName Set the track color. colorString can be a comma delimited rgb string with components in the range 0-255, for example 255,0,0, or a hex color string, for example FF0000. IGV release 2.11.1
setDataRange rangeString trackName Set the data range (scale) for all numeric tracks, or if a trackName is specified a specific track. rangeString is either a 2 comma delimited list for min,max. As of release 2.11.0 ‘auto’ can be used for rangeString, which will set the track(s) to autoscale.
setLogScale true/false trackName Set the data scale to log (true) or linear (false). Optionally specify a track, if no track is specified all numeric tracks will be set.
setSleepInterval ms Sets a delay (sleep) time in milliseconds. The sleep interval is invoked between successive commands.
setTrackHeight height trackName Set the specified track’s height in integer units. trackName is required.
snapshotDirectory path Sets the directory in which to write images.
snapshot filename Saves a snapshot of the IGV window to an image file. If filename is omitted, writes a PNG file with a filename generated based on the locus. If filename is specified, the filename extension determines the image file format, which must be either .png or .svg.
sort option locus Sorts alignment or segmented copy number tracks. See below for valid option values. If supplied, the locus option can define a single position, or a range. If absent sorting will be based on the region in view for segmented copy number, or the center position of the region in view for alignment tracks.
squish trackName Squish a given trackName. trackName is optional, and if it is not supplied all annotation tracks are squished.
viewaspairs trackName Set the display mode for an alignment track to View as pairs. trackName is optional.

4 Exercises

The exercises below are designed to strengthen your knowledge of using IGV and writing batch commands. The solution to each problem is blurred, only after attempting to solve the problem yourself should you look at the solution. Should you need any help, please ask one of the instructors.

4.1 Protein-DNA binding

ChIP-sequencing, also known as ChIP-seq, is a method used to analyze protein interactions with DNA. ChIP-seq combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global binding sites precisely for any protein of interest.

In the chipseq directory are a set of files generated from the analysis of a ChIP-seq experiment:

  • ZHBTC4_OCT4_UNT.bw - Coverage track for the DNA-binding protein OCT4
  • ZHBTC4_Input_UNT.bw - Coverage track for the input genomic DNA
  • ZHBTC4_OCT4_UNT_peaks.narrowPeak - Peak regions detected by MACS2 peak caller
  • ZHBTC4_OCT4_UNT_summits.bed - Summit regions detected by MACS2 peak caller

All of the data files are compatible with the mm10 mouse reference genome.

Using both the IGV browser and the command line, answer the following questions:

  1. Which peak has the highest score? Type man sort for help sorting.
bash
# ZHBTC4_OCT4_UNT_peak_24
sort -k5,5nr tutorial/data/chipseq/ZHBTC4_OCT4_UNT_peaks.narrowPeak | head -n 1
## chr19    7261376 7262037 ZHBTC4_OCT4_UNT_peak_24 1690    .   36.5612 176.877 169.089 326
  1. What gene is nearest to the highest scoring peak?
R
# Rcor2
  1. Which peak is the longest? Type man awk for help calculating.
bash
# ZHBTC4_OCT4_UNT_peak_149
awk '{print $1, $2, $3, $4, $3 - $2}' tutorial/data/chipseq/ZHBTC4_OCT4_UNT_peaks.narrowPeak | sort -k5,5nr | head -n 1
## chr19 38347912 38349082 ZHBTC4_OCT4_UNT_peak_149 1170
  1. What base is covered by the summit of the longest peak?
R
# G
  1. How many peaks are located inside the Cep55 gene?
R
# 3

4.2 Genomic variation

Exome sequencing, also known as whole exome sequencing (WES), is a genomic technique for sequencing all of the protein-coding regions of genes in a genome (known as the exome). It consists of two steps: the first step is to select only the subset of DNA that encodes proteins. These regions are known as exons. The second step is to sequence the exonic DNA using any high-throughput DNA sequencing technology.

In the exome directory is a VCF file generated from the analysis of a whole-exome sequencing (WES) experiment. The file is called 1KGP.vcf and contains variant calling information from a number of samples from the 1000 Genomes Project data portal. The VCF file is compatible with the hg39 human reference genome.

Load the VCF file in the IGV browser and answer the following questions:

  1. How many samples are represented in the VCF file?
R
# 10
  1. What is the reference and alternate allele at position chr19:4196493
R
# Reference: A
# Alternate: G
  1. How many SNPs are present in the ACP7 gene?
R
# 1
# The first SNP is homozygous reference
  1. What two positions show homozygous variants in all 10 samples?
R
# chr19:9387798
# chr19:55999276
  1. How many of the following alleles are present at position chr19:4355511
  • Homozygous reference
  • Homozygous variant
  • Heterozygous variant
R
# Homozygous reference: 1
# Homozygous variant: 3
# Heterozygous variant: 6

4.3 Gene expression

RNA-Seq (named as an abbreviation of RNA sequencing) is a sequencing technique which uses next-generation sequencing (NGS) to reveal the presence and quantity of RNA in a biological sample at a given moment, analyzing the continuously changing cellular transcriptome.

In the rnaseq directory are a set of coverage tracks generated from the analysis of an RNA-seq experiment. All of the data files are compatible with the mm10 mouse reference genome. Write a batch script to recreate the snapshot shown below:

Output

Click the bash chunk to reveal the solution:

verbatim
new

genome mm10

goto chr19:8,987,120-9,078,935 chr19:38,052,980-38,076,424 chr19:27,386,697-27,431,908

load tutorial/data/rnaseq/BRG1FL_TAM.bw
setColor #FF0000 BRG1FL_TAM.bw

load tutorial/data/rnaseq/BRG1FL_UNT.bw
setColor #0000FF BRG1FL_UNT.bw

load tutorial/data/rnaseq/ZHBTC4_DOX.bw
setColor #008000 ZHBTC4_DOX.bw

load tutorial/data/rnaseq/ZHBTC4_UNT.bw
setColor #800080 ZHBTC4_UNT.bw

setDataRange 0,50

snapshot exercises/expression/snapshot.png

exit